Attention mechanism
See Transformer model, BERT, Recurrent neural network
Compared with Sentence embedding approach, attention mechanism allows to retain information from longer sentences. The context vector is generated dynamically by having shortcuts to words in the input sentence.
Variations
Library and code
Tutorials and articles
- Lil’Log: Attention? Attention!
- CMU Neural Nets for NLP 2017 (9): Attention
- The math behind Attention: Keys, Queries, and Values matrices
- Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
References
- Bahdanau2015: NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - the earliest attention mechanism in Deep learning?
- Attention Is All You Need